Simplifier: a web tool to eliminate redundant NGS contigs
نویسندگان
چکیده
UNLABELLED Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. AVAILABILITY Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.
منابع مشابه
A36 Prevalence of HIV-1 subtypes in Slovenia with an emphasis on molecular and phylogenetic investigation of subtype A
formatics pipeline to identify and classify all known viruses present in a metagenomic sample. Viral NGS reads are identified using a protein-based alignment method, DIAMOND, which is substantially faster than the standard BLAST method, and more reliable for viruses. These reads are automatically assembled into contigs using SPAdes, a de novo assembler. The contigs are then used to classify the...
متن کاملA35 Viral evolution and innate immune responses during acute HIV-1 infection and their association with disease pathogenesis
formatics pipeline to identify and classify all known viruses present in a metagenomic sample. Viral NGS reads are identified using a protein-based alignment method, DIAMOND, which is substantially faster than the standard BLAST method, and more reliable for viruses. These reads are automatically assembled into contigs using SPAdes, a de novo assembler. The contigs are then used to classify the...
متن کاملGenome Wide Association Studies, Next Generation Sequencing and Their Application in Animal Breeding and Genetics: A Review
Recently genetic studies have been revolutionized by next generation sequencing (NGS) technology, and it is expected that the use of this technology will largely eliminate defects in the methods of association studies. The NGS technology is becoming the premier tool in genetics. However, at the moment the use of this method is limited especially in the livestock due to high cost and computation...
متن کاملBacterial genome mapper: A comparative bacterial genome mapping tool
UNLABELLED Recently, next generation sequencing (NGS) technologies have led to a revolutionary increase in sequencing speed and costefficacy. Consequently, a vast number of contigs from many recently sequenced bacterial genomes remain to be accurately mapped and annotated, requiring the development of more convenient bioinformatics programs. In this paper, we present a newly developed web-based...
متن کاملGMcloser: closing gaps in assemblies accurately with a likelihood-based selection of contig or long-read alignments
MOTIVATION Genome assemblies generated with next-generation sequencing (NGS) reads usually contain a number of gaps. Several tools have recently been developed to close the gaps in these assemblies with NGS reads. Although these gap-closing tools efficiently close the gaps, they entail a high rate of misassembly at gap-closing sites. RESULTS We have found that the assembly error rates caused ...
متن کامل